Word Re-Embedding via Manifold Dimensionality Retention
نویسندگان
چکیده
Word embeddings seek to recover a Euclidean metric space by mapping words into vectors, starting from words cooccurrences in a corpus. Word embeddings may underestimate the similarity between nearby words, and overestimate it between distant words in the Euclidean metric space. In this paper, we re-embed pre-trained word embeddings with a stage of manifold learning which retains dimensionality. We show that this approach is theoretically founded in the metric recovery paradigm, and empirically show that it can improve on state-of-the-art embeddings in word similarity tasks 0.5 − 5.0% points depending on the original space.
منابع مشابه
Word, graph and manifold embedding from Markov processes Author=Tatsunori Hashimoto, David Alvarez-Melis, Tommi S. Jaakkola
Continuous vector representations of words and objects appear to carry surprisingly rich semantic content. In this paper, we advance both the conceptual and theoretical understanding of word embeddings in three ways. First, we ground embeddings in semantic spaces studied in cognitivepsychometric literature and introduce new evaluation tasks. Second, in contrast to prior work, we take metric rec...
متن کاملSupervised Manifold Learning with Incremental Stochastic Embeddings
In this paper, we introduce an incremental dimensionality reduction approach for labeled data. The algorithm incrementally samples in latent space and chooses a solution that minimizes the nearest neighbor classification error taking into account label information. We introduce and compare two optimization approaches to generate supervised embeddings, i.e., an incremental solution construction ...
متن کاملLocally Linear Embedded Eigenspace Analysis
The existing nonlinear local methods for dimensionality reduction yield impressive results in data embedding and manifold visualization. However, they also open up the problem of how to define a unified projection from new data to the embedded subspace constructed by the training samples. Thinking globally and fitting locally, we present a new linear embedding approach, called Locally Embedded ...
متن کاملThought Chart: Tracking Dynamic EEG Brain Connectivity with Unsupervised Manifold Learning
Assuming that the topological space containing all possible brain states forms a very high-dimensional manifold, this paper proposes an unsupervised manifold learning framework to reconstruct and visualize this manifold using EEG brain connectivity data acquired from a group of healthy volunteers. Once this manifold is constructed, the temporal sequence of an individual’s EEG activities can the...
متن کاملWord Embeddings as Metric Recovery in Semantic Spaces
Continuous word representations have been remarkably useful across NLP tasks but remain poorly understood. We ground word embeddings in semantic spaces studied in the cognitive-psychometric literature, taking these spaces as the primary objects to recover. To this end, we relate log co-occurrences of words in large corpora to semantic similarity assessments and show that co-occurrences are inde...
متن کامل